Page tools: Print Page Print All | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. LINKAGE RESULTS, 2006-11-16, 2006 PANEL TABLE 1 - LINKAGE RATES, By Selected Characteristics
(b) Includes Other Territories. (c) Includes Migratory areas. The linkage rates for the 2006 ACLD panel were relatively consistent across most sub-populations and were in line with expected results. Compared with the overall linkage rate of 76%, the sub-populations which achieved the highest linkage rates for the 2006-2011 linkage were persons:
The same sub-populations had the highest linkage rates when linking through the three Census periods:
The sub-populations which achieved the lowest 2006-2011 linkage rates were persons:
The lowest 2006-2011-2016 linkage rate by sub-population was those aged 75 years and over (29%) while the North Territory (48%) had the lowest linkage rate by state. Most sub-populations followed a trend in their linkage rates across the three Census periods, although certain sub-populations fell considerably. Persons aged 15 to 19 in 2006 initially linked 74% of records to the 2011 Census, however dropped to 51% when linking to the 2016 Census. This is likely due to the high level of mobility as persons enter the 20 to 29 age range. Traditionally, the Census Post Enumeration Survey (PES) has shown that the Census has higher rates of undercount for people of Aboriginal and/or Torres Strait Islander origin, those aged between 20 and 29 and for those in the Northern Territory. As expected, the lower ACLD linkage rates broadly aligned with the same groups that experience higher levels of undercount in the 2016 Census. One additional group that had lower linkage rates were persons aged 75 and over at the time of the 2006 Census who, due to age, had an increased risk of death over the ensuing ten years. Further information on Census undercount can be found in Census of Population and Housing: Details of Overcount and Undercount, 2011 (cat. no. 2940.0) and Census of Population and Housing - Details of Undercount, 2016 (cat. no. 2940.0). Further data cubes demonstrating the linkage rates for various sub-populations are available as an attachment to this Information paper. 3.1 LINKAGE ACCURACY The following quality measures were calculated for the ACLD and indicate a good level of overall quality:
3.1.1 Linkage Precision Not all record pairs assigned as links in a data linkage process are a true match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are actually false, i.e. the records in the link belong to different people rather than the same person. The linkage strategy used for the ACLD was designed to ensure a high level of accuracy while also achieving a sufficiently high number of links to enable longitudinal research. Accordingly, the strategy was restrictive and conservative. One of the key measures of linkage quality is the proportion of links in the dataset that are false. The number of false links is able to be estimated through the use of methods such as clerically reviewing a sample of links, or by using modelling techniques. Once an estimate of the number of false links is obtained, a 'precision' can be calculated. The precision is an estimate of the proportion of links that are matches (i.e. belonging to the same entity). With clerical review unavailable for the re-link of the 2006 panel, the model designed by Chipperfield et al (2018) known as the Feasibility Calculator (FC) was used as the primary method of calculating precision and setting a cut-off for the 2006-2011 re-link for the 2006 Panel sample. The FC uses the theory developed by Fellegi & Sunter (1969) to conduct a record linkage simulation multiple times in order to estimate precision. The FC then compiles the results of these simulated linkages to calculate the lowest linkage weight at the desired level of precision in each probabilistic linkage pass. These results can then be used to inform a single cut-off point for probabilistic linkage results. Due to the unavailability of name information the ability to distinguish a unique link becomes more difficult, so to ensure a high quality linkage while maintaining a high linkage rate it was decided to set the desired estimated cumulative precision at 95%, or an estimated false link rate of approximately 5%. This method achieved a 77.2% linkage rate when linking the 2006 Panel to 2011 Census records. Precision estimation for the 2011-2016 linkage of the ACLD involved conducting clerical review on a stratified random sample of links. Potential links were stratified by their link weight value, with a minimum of 5% of links sampled from each individual link weight value (after rounding down to the nearest integer). The results of the clerical review were used to calculate precision estimates for links grouped by pass and rounded link weight value, which were then applied to the entire set of linkage results. This provided an estimate of precision for each individual link, which can be referred to as 'marginal precision', and is the likelihood of a single link being 'true' (i.e. the records belonging to the same person). Using the marginal precision, the 'cumulative precision' of the final set of one-to-one links could be estimated, i.e. the overall precision of the linked dataset. After producing both marginal and cumulative precision estimates, a cut-off point was selected. This cut-off is intended to optimise both the number of links and cumulative precision of the links retained above the cut-off point, while at the same time maintaining a high level of marginal precision for every individual link above the cut-off. The marginal precision estimates were used to select the cut-off, with all links with a marginal precision of at least 81% being retained. This resulted in a final file of 605,626 links once the cut-off was applied, with an estimated cumulative precision of 98.6%, or a false link rate of 1.4%, for these links. Clerical review relies upon judgment by a well-trained individual, therefore, while efforts are taken to minimise the risk, it is possible for a link to be incorrectly assigned as a match or non-match. The method for measuring precision developed by Chipperfield et al (2018) was used to provide an independent model-based estimate of the precision. While the clerical estimate of cumulative precision for the 2011-2016 linkage was 98.6%, the model-based approach estimated the precision to be over 99%. The precision as estimated by the clerical review process was retained as the more conservative estimate. Table 2 provides a summary of the precision estimate and false link rate by the pass where each link was selected (estimated via clerical review) for the 2011-2016 linkage. TABLE 2 - PRECISION ESTIMATES AND FALSE LINK RATES, By Pass Number, 2011-2016 linkage (2006 Panel)
Over three quarters of all links were achieved in the first pass of the project (78.4% for 2006-2011 and 75.5% for 2011-2016), which used a deterministic linking methodology to identify and filter matches. This pass implemented tight geographic and demographic restrictions to maximise the number of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single unique record pair was identified. 3.1.2 Consistency of Common Information on Record Pairs In data linkage projects, geographic boundaries function as blocking variables that restrict the search for links to records which agree on the defined geography. They are also used as linking variables, and when combined with other linking fields (such as hashed name (2011-2016 only), age, sex and date of birth), they provide a high level of uniqueness, and reduce the likelihood of linking to an incorrect record. Tables 3a and 3b display the number of records that had consistent information on key linking variables, grouped by levels of geography. TABLE 3a - CONSISTENCY OF LINKED RECORDS, By Geography And Selected Linking Fields, 2006 Panel, 2006-2011 linkage
(b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories. (c) Percentages may not add up to the total due to rounding. TABLE 3b - CONSISTENCY OF LINKED RECORDS, By Geography And Selected Linking Fields, 2006 Panel, 2011-2016 linkage
(b) Categories are mutually exclusive. Records that agree in each category are excluded from subsequent categories. (c) Percentages may not add up to the total due to rounding. By contrast, record pairs may have inconsistent information and yet be a match. Inconsistent information may be recorded for the same person in different Censuses due to a range of factors, including:
Of particular note is inconsistency due to non-reporting of name and date of birth in the 2011 Census and the 2016 Census. Respondents are becoming less likely to provide their date of birth, with 90% reporting in the 2011 Census decreasing to 81% reported date of birth in the 2016 Census. Further, just over one per cent of Australians had a missing, or blank, response for first name or surname in the 2016 Census. There appeared to be a relationship between having a missing response for both first name and surname and non-response on other variables. Of the people who did not report first name and surname, approximately half did not report at least one of sex, age, or Indigenous status. The vast majority of missing responses came from paper forms, with the overall level of missing responses in the 2016 Census remaining low. 3.1.3 Comparison with the original 2006 Panel linkage Table 4 compares the final results of the original 2006 Panel linkage with the revised linkage. TABLE 4 – COMPARISON OF LINKAGE RESULTS, 2006-2011
While the linkage rate has reduced there is greater confidence in the precision of the links that have been achieved in the re-link of the 2006 Panel due to the enhanced linking methodology implemented for the linkage. Over the entire panel 81.6% of records always achieved the same result (same link identified, or not linked). The changes in links can be viewed in Table 5. TABLE 5 – STATUS OF LINKS, 2006-2011
(a) Data presented in the table have not been perturbed. 3.2 CHARACTERISTICS OF LINKED AND UNLINKED 2006 ACLD PANEL SAMPLE The random sample selected from the 2006 Census for the 2006 ACLD Panel was designed to be representative of the Australian population by age, sex and jurisdiction as well as other characteristics such as Indigenous status and country of birth. Table 6 shows the distribution of key populations across the 2006 Census, the 2006 ACLD Panel sample and the 2006-2011 linked results. TABLE 6 - SELECTED CHARACTERISTICS, By 2006 Census, 2006 ACLD Panel Sample, 2006-2011 ACLD Linked Results
(b) Data presented in the table have been perturbed. As a result the sum of individual categories may not align with totals. (c) Includes Other Territories. (d) Includes Migratory areas. The distribution of the ACLD file by sub-population was generally well aligned with both the 2006 Panel sample and the entire 2006 Census. When looking at the relative difference between these proportions, however, some differences are more clearly observed. Compared with the entire 2006 Census, the linked 2006 ACLD Panel contains relatively more records for people aged 40-49 and 50-59 years, and to a lesser extent those aged 60-69 years. By contrast, the linked 2006 Panel contains relatively fewer records for people aged 20-29 years and 80 years and over. This is applicable for both the 2006-2011 and 2006-2011-2016 linkages, with the latter having increased proportional differences when compared to the 2006 Census. In general, the distribution of weighted counts for the linked ACLD file is close to that of the entire 2006 Census, but it should be noted that the weighting process is not designed to produce counts corresponding to the population in 2006. Rather, the weighted population is that of people who were in scope of both the 2006 and 2011 Censuses for the 2006-2011 linkage and of people who were in scope of the 2006, 2011 and 2016 Censuses for the 2006-2011-2016 linkage (see Section 3.4 Weighting). Thus, for example, the lower proportion of older people in the linked file, even after weighting, reflects the impact on the 2006 Panel sample of deaths that occurred between 2006, 2011 and 2016. Further data cubes demonstrating more detailed population distributions are provided as an attachment to this Information paper. 3.3 REASONS FOR UNLINKED RECORDS There are two main reasons why records from the 2006 Panel sample were not linked to a 2011 Census and/or 2016 Census record:
3.3.1 Missing and/or inconsistent information In these cases, the true match was present in the pool of all record pairs but it was not identified because there was a high level of inconsistency between information on each Census, or key linking fields were missing altogether. The reasons for the match being missed can be categorised into the following groups:
Accurate address coding was crucial in narrowing the search and differentiating between true and false links. It was a particular challenge for persons who had moved, since linkage was then heavily dependent on accurate recall and detailed information supplied in the 2011 and 2016 Censuses about the person’s address five years previous. Processing for the 2011 and 2016 Census involved coding for address five years ago to a fine level of geography, ideally Mesh Block. This was not always possible, due to insufficient and/or incorrect address information being supplied for some persons, potentially due to recall issues. 3.3.2 No 2011 or 2016 Census Record According to mortality data compiled by the ABS from data supplied by the Registrars of Births, Deaths and Marriages, approximately 700,000 people died in Australia between 2006 and 2011 and approximately 913,000 between 2011 and 2016. If 5% of these people were selected in the 2006 Panel sample, then it could be estimated that up to 35,000 people could not have been linked due to death between 2006 and 2011. Similarly, migration data estimates that just over one million people left Australia as permanent emigrants between 2006 and 2011, while just over 1.4 million people left between 2011 and 2016, potentially resulting in up to 50,000 people from the 2006 Panel sample being unlikely to have a corresponding 2011 Census record. For more information please refer to the relevant releases of Migration, Australia (cat. no. 3412.0) and Deaths, Australia (cat. no. 3302.0). The net undercount rate was 1.7% for the 2011 Census and 1% for the 2016 Census, with higher rates for Aboriginal and Torres Strait Islander people than for the non-Indigenous population. Thus approximately 15,000 people from the 2006 Panel sample could have been missed in the 2011 Census. This estimate is a starting point only and does not take into account the likelihood of people being missed in successive Censuses. For more information please refer to Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0) for the 2011 Census and Census of Population and Housing: Details of Overcount and Undercount, 2016 (cat. no. 2940.0) for the 2016 Census. The proportion of links achieved in the 2011-2016 linkage of the 2006 Panel is approximately 3.5% of the 2011 Census population. Using the factors outlined above, approximately 913,000 people died between 2011 and 2016, therefore it could be estimated that almost 32,000 people could not have been linked due to death between 2011 and 2016. Similarly, migration data estimates that just over 1.4 million people left Australia as permanent emigrants between 2011 and 2016, potentially resulting in approximately 49,000 people being unlikely to have a corresponding 2016 Census record due to migration between 2011 and 2016. For more information please refer to the relevant releases of Migration, Australia (cat. no. 3412.0) and Deaths, Australia (cat. no. 3302.0). Taking into account the net undercount rate of 1% for the 2016 Census, it is estimated that almost 8,000 persons may have been missed and therefore missing a corresponding 2016 Census record.
Document Selection These documents will be presented in a new window.
|